Lab 5 & 6

Classroom distribution

Lab 5

Lab topics:

Sampling Distribution + CLT

  • Difference between a statistic & a parameter
  • A statistic can be a random variable
  • Central Limit Theorem (CLT)

Review:

  • Populations can be at least partially described by population parameters.
  • Statistics or estimators are used to estimate population parameters.
  • The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable.

A simple simulation:

Code
x <- rexp(n = 500, rate = 1)
hist(x, prob=TRUE, breaks=20) 

Many simulations:

Code
n <- 50
N <- 1000
# Create a matrix which stores the 1000 samples of size 50
X <- matrix(data = rexp(N*n, rate = 1), nrow = N, ncol = n)
# Find the mean of each sample
xbar <- apply(X, MARGIN = 1, FUN = mean)

plot(density(xbar))    

Lab 6

Lab topics:

Inference: Point Estimates

  • Understand Point Estimates
  • Learn relevant R functions (loops)
  • Review Random Variables

Review:

  • An estimator is a function of the sample, i.e., it is a rule that tells you how to calculate an estimate of a parameter from a sample.
  • An estimate is a value of an estimator calculated from a sample.
  • Different estimators are possible for the same parameter.
  • We want unbiased point estimators.

Some important estimators

  • Sample mean: \(\bar{X} = \frac{X_1+ \cdots+ X_n}{n}\)

  • Sample Variance: \(S^2 = \frac{1}{n-1} \sum_{i= 1}^n (X_i - \bar{X})^2\)

Loops:

Code
n <- 10
N <- 1000
mu <- 105 
sigma <- 12
## Create an empty matrix to store the samples in and an empty vector to store sample variances
samples <- matrix(NA, nrow = N, ncol = n)
sample.variances<-rep(NA, N)

for (i in 1:N) {
  samples[i,] <- rnorm(n, mean=mu, sd=sigma)
  deviation <- samples[i,] - mean(samples[i,])
  sample.variances[i] <- (1/(n-1)) * sum((deviation)^2) 
  }

mean(sample.variances)
#> [1] 142.6414
Code
hist(sample.variances, prob = TRUE, breaks = 20)

Code
# create a matrix which stores the 1000 samples of size 10
X <- matrix(data = rnorm(n*N, mean=mu, sd=sigma), nrow = N, ncol = n)
# find the mean of each sample; MARGIN = 1, indicates rows in the matrix X
sample.variances2 <- apply(X, MARGIN = 1, FUN = var)

mean(sample.variances2)
#> [1] 142.0096
Code
plot(density(sample.variances2))